KoGraR: Standardized Statistical Analyses of Corpus Counts
نویسندگان
چکیده
Within the project “Corpus grammar” (Korpusgrammatik) at the Institute for the German Language (Institut für Deutsche Sprache, IDS) in Mannheim, techniques and tools are developed for the description of grammatical phenomena based on analyses of very large morphosyntactically annotated corpora. The goal of the project is a corpus-based grammar that captures variations of grammatical structure in presentday German. In the first project phase, pilot studies were conducted (cf. Bubenhofer et al., 2014; Fuß, 2014; Konopka, 2014) to exploit and evaluate various methodological approaches to variation phenomena. For each research question, statistical analyses were chosen and customized. From these analyses, a subset was extracted as the methodological core of the project, with the aim of supporting methodological coherence, interoperability of sub-projects and, finally, the descriptive coherence of the project result, that is, the grammar. The methodological core has been made available to project members via an easy-to-use web front-end: the results of corpus queries and other, user-defined data tables can be uploaded and analyzed automatically. The web front-end is called KoGraR.
منابع مشابه
Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملGrowing Trees from Morphs: Towards Data-Driven Morphological Parsing
We present a quantitative approach to disambiguating flat morphological analyses and producing more deeply structured analyses. Based on existing morphological segmentations, possible combinations of resulting word trees for the next level are filtered first by criteria of linguistic plausibility and then by weighting procedures based on the geometric mean. The frequencies for weighting are der...
متن کاملThe Assessment of Pragmatic Knowledge in the Online General IELTS-Practice Resources: A Corpus Analysis of Writing Tasks
Motivated by the concept of Communicative Language Ability and the eminence of the IELTS exam, this study intended to scrutinize the representation of functional knowledge (FK) and socio-linguistic knowledge (SK) as sub-components of pragmatic knowledge in the writing performances of both tasks of the online General IELTS-practice resources across three band scores. This quantitative inter-scor...
متن کاملNeural Sequence-to-sequence Learning of Internal Word Structure
Learning internal word structure has recently been recognized as an important step in various multilingual processing tasks and in theoretical language comparison. In this paper, we present a neural encoder-decoder model for learning canonical morphological segmentation. Our model combines character-level sequence-to-sequence transformation with a language model over canonical segments. We obta...
متن کاملHuman Rights Texts: Converting Human Rights Primary Source Documents into Data
We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, whic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015